Skip to content

stream: fix UTF-8 character corruption in fast-utf8-stream#61745

Open
mcollina wants to merge 1 commit intonodejs:mainfrom
mcollina:fix-utf8-stream-partial-write
Open

stream: fix UTF-8 character corruption in fast-utf8-stream#61745
mcollina wants to merge 1 commit intonodejs:mainfrom
mcollina:fix-utf8-stream-partial-write

Conversation

@mcollina
Copy link
Member

@mcollina mcollina commented Feb 8, 2026

Summary

Fix releaseWritingBuf() to correctly handle partial writes that split multi-byte UTF-8 characters.

The previous implementation incorrectly converted byte counts to character counts by using:

n = Buffer.from(writingBuf).subarray(0, n).toString().length;

When n bytes cuts through a multi-byte character, the incomplete UTF-8 sequence becomes U+FFFD (replacement character) via .toString(), which has a different .length than the original character. This caused:

  • 3-byte characters (CJK) to be silently dropped
  • 4-byte characters (emoji) to leave lone surrogates in the buffer

The fix backs up from the byte position to find a valid UTF-8 character boundary by checking for continuation bytes (pattern 10xxxxxx), then decodes the properly-aligned bytes to get the correct character count.

Also fixes a typo where this._asyncDrainScheduled was used instead of the private field this.#asyncDrainScheduled.

Fixes: #61744

Fix releaseWritingBuf() to correctly handle partial writes that split
multi-byte UTF-8 characters. The previous implementation incorrectly
converted byte counts to character counts, causing:

- 3-byte characters (CJK) to be silently dropped
- 4-byte characters (emoji) to leave lone surrogates in the buffer

The fix backs up from the byte position to find a valid UTF-8 character
boundary by checking for continuation bytes (pattern 10xxxxxx), then
decodes the properly-aligned bytes to get the correct character count.

Also fixes a typo where this._asyncDrainScheduled was used instead of
the private field this.#asyncDrainScheduled.

Fixes: nodejs#61744
@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/streams

@nodejs-github-bot nodejs-github-bot added needs-ci PRs that need a full CI run. stream Issues and PRs related to the stream subsystem. labels Feb 8, 2026
@codecov
Copy link

codecov bot commented Feb 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.74%. Comparing base (f77a709) to head (4f5cf65).
⚠️ Report is 44 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #61745      +/-   ##
==========================================
- Coverage   89.74%   89.74%   -0.01%     
==========================================
  Files         674      675       +1     
  Lines      204360   204537     +177     
  Branches    39265    39308      +43     
==========================================
+ Hits       183412   183569     +157     
- Misses      13251    13257       +6     
- Partials     7697     7711      +14     
Files with missing lines Coverage Δ
lib/internal/streams/fast-utf8-stream.js 74.57% <100.00%> (+0.33%) ⬆️

... and 44 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-ci PRs that need a full CI run. stream Issues and PRs related to the stream subsystem.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UTF-8 character corruption in fast-utf8-stream.js via releaseWritingBuf() leads to silent data loss on partial writes

3 participants